Background

The weather and climate variables used in this analysis are calculated from DayMet. Information about vegetation functional composition from FIA, AIM, and the LANDFIRE reference database is averaged to the same spatial scale as dayMet, so the dataset we use contains climate and weather information for every dayMet cell that has vegetation functional composition information in each year when that vegetation data were collected.

For each dayMet grid cell in each year when vegetation data was collected, we calculated the following set of climate variables (average values over the previous years (as few as 5 and as many as 30 years)) and weather variable anomalies (the average of values over the previous 3 years relative to the climate average). Note that the we indicate the method of summarizing a climate variable across the climate or weather anomaly years, as well as the units and the method that the recent weather anomaly relative to the climate variable was calculated.

Variable Units Anomaly_Format
Average annual minimum daily temperature - Mean across years Degrees Celsius absolute difference
Average annual maximum daily temperature - Mean across years Degrees Celsius absolute difference
Average annual mean daily temperature - Mean across years Degrees Celsius absolute difference
Total annual daily precipitation - Mean across years mm % difference
Temperature of the warmest month - Mean across years Degrees Celsius absolute difference
Temperature of the coldest month - Mean across years Degrees Celsius absolute difference
Precipitation of wettest month - Mean across years mm % difference
Precipitation of driest month - Mean across years mm % difference
Precipitation seasonality - Mean across years mm % difference
Correlation of monthly precipitation and temperature - Mean across years correlation absolute difference
Month when temperature first is above freezing - Mean across years numerical month absolute difference
Isothermality - Mean across years isothermal ratio absolute difference
Annual Water Deficit - Mean across years mm of water/degrees celsius % difference
Annual wet degree days - Mean across years Degree days % difference
Anual mean vapor pressure deficit - Mean across years milibars absolute difference
Annual maximum vapor pressure deficit- Mean across years milibars absolute difference
Annual minimum vapor pressure deficit- Mean across years milibars absolute difference
Number of of frost-free days - Mean across years days absolute difference
Annual maximum vapor pressure deficit - 95th percentile across years milibars absolute difference
Annual water deficit - 95th percentile across years mm of water/degrees celsius % difference
Annual wet degree days - 5th percentile across years Degree days % difference
Number of of frost-free days - 5th percentile across years days absolute difference

Now, look at the distribution of input values

dat %>% 
  select(tmean_meanAnnAvg_3yrAnom:durationFrostFreeDays_meanAnnAvg_3yrAnom, Year, Long, Lat) %>% 
  pivot_longer(cols = tmean_meanAnnAvg_3yrAnom:durationFrostFreeDays_meanAnnAvg_3yrAnom,
               names_to = "variable",
               values_to = "value"
                ) %>% 
  ggplot() +
  facet_wrap(~variable, scales = "free") +
   geom_histogram(aes(value))

For each grid cell, we also include soil information, which is extrapolated from the SOLUS100 dataset. This information is fixed across time in our dataset.
Variable
soil depth
percentage of clay in the soil surface (0-3 cm)
average percentage of sand across the soil profile
average percentage of coarse fragments across the soil profile
percentage of organic matter in the soil surface (0-3 cm)
total available water holding capacity

Identify un-correlated climate and soils predictors – uniquely for each ecoregion

First, we want to identify a set of possible climate and soils predictors that are not correlated to then use as predictors in models that estimate vegetation composition.

We do this first for all of CONUS, then independently for each of two ecoregions (shrub/grassland and forest)

CONUS

First, get data just for grassland/shrubland sites (and drop NAs )

#Here is the correlation structure of all of the potential input climate and soils variables
(corrPlot <- 
   datCONUS %>% 
      dplyr::select(
       "tmin", 
      "tmax", 
      "tmean",
      "prcp",
      "t_warm",
      "t_cold",
      "prcp_wet",
      "prcp_dry",
      "prcp_seasonality",
      "prcpTempCorr",
      "abvFreezingMonth",
      "isothermality",
      "annWatDef",
      "annWetDegDays",
      "VPD_mean",
      "VPD_max",
      "VPD_min",
      "VPD_max_95",
      "annWatDef_95",
      "annWetDegDays_5",
      "frostFreeDays_5",
      "frostFreeDays",
      "soilDepth",
      "clay",
      "sand",
      "coarse",
      "carbon",
      "AWHC"
   ) %>% 
    slice_sample(n = 5e3) %>% 
   #cor()  %>% 
   #caret::findCorrelation(cutoff = .7, verbose = TRUE, names = TRUE, exact = TRUE)) 
   ggpairs(upper = list(continuous = my_fn
                          ,  size = .1
                        ), 
     lower = list(continuous = GGally::wrap("points", alpha = 0.1, size=.1
                                            )), 
            progress = FALSE) )# + ggtitle("Correlation of Grass/Shrubland Predictor Variables"))

# corrPlot
# dev.off()

The following variables were removed due to correlation with other variables – first pass (vars. in parenthesis are kept):

  • tmin, tmax, t_warm, t_cold (tmean)
  • precip_wet (prcp)
  • frostfreedays, frostfree days_5, abvfreezingmonth (tmean)
  • VPD mean, VPD min, VPD max 95 (VPD_max)
  • soildepth (AWHC)

Then, we removed the following additional variables based on their correlation:

## All correlations <= 0.7
## character(0)
  • ‘annWetDegDays_5’ and ‘annWetDegDays’: cor = .98; chose ‘annWetDegDays’
  • ‘annWatDef_95’ and ‘annWatDef’: cor = .98; chose ‘annWatDef’
  • ‘annWatDef’ and ‘prcp_seasonality’: cor = .78; chose ‘prcp_seasonality’
  • ‘prcp_dry’ and ‘prcp_seasonality’: cor = .79; chose ‘prcp_seasonality’
  • ‘tmean’ and ‘VPD_max’: cor = .93; chose ‘tmean’ (opposite of recommendation by function, but tmean is easier to calculate)
  • ‘sand’ and ‘clay’: cor = .70; chose ‘sand’ # remove Carbon, since it’s highly skewed

The final variables selected are shown below

Shrubland and Grassland

First, get data just for grassland/shrubland sites (and drop NAs )

#Here is the correlation structure of all of the potential input climate and soils variables
(corrPlot <- 
   datGrassShrub %>% 
      dplyr::select(
       "tmin", 
      "tmax", 
      "tmean",
      "prcp",
      "t_warm",
      "t_cold",
      "prcp_wet",
      "prcp_dry",
      "prcp_seasonality",
      "prcpTempCorr",
      "abvFreezingMonth",
      "isothermality",
      "annWatDef",
      "annWetDegDays",
      "VPD_mean",
      "VPD_max",
      "VPD_min",
      "VPD_max_95",
      "annWatDef_95",
      "annWetDegDays_5",
      "frostFreeDays_5",
      "frostFreeDays",
      "soilDepth",
      "clay",
      "sand",
      "coarse",
      "carbon",
      "AWHC"
   ) %>% 
    slice_sample(n = 5e3) %>% 
   #cor()  %>% 
   #caret::findCorrelation(cutoff = .7, verbose = TRUE, names = TRUE, exact = TRUE)) 
   ggpairs( upper = list(continuous = my_fn, size = .1), lower = list(continuous = GGally::wrap("points", alpha = 0.1, size=.1
                                                                                                )), progress = FALSE) #+ ggtitle("Correlation of Grass/Shrubland Predictor Variables")
 )

# bmp( here("Figures", "CoverDatFigures", "GrassShrubClimVarCorrelations.bmp"), width = 2500, height = 2500)
# corrPlot
# dev.off()

The following variables were removed due to correlation with other variables – first pass (vars. in parenthesis are kept):

  • tmin, tmax, t_warm, t_cold (tmean)
  • precip_wet (prcp)
  • frostfreedays, frostfree days_5, abvfreezingmonth (tmean)
  • VPD mean, VPD min, VPD max 95 (VPD_max)
  • soildepth (AWHC)

Then, we removed the following additional variables based on their correlation:

## All correlations <= 0.7
## character(0)
  • ‘annWetDegDays_5’ and ‘annWetDegDays’: cor = .95; chose ‘annWetDegDays’
  • ‘annWetDegDays’ and ‘prcp’: cor = .86; chose ‘prcp’
  • ‘annWatDef’ and ‘annWatDef_95’: cor = .97; chose ‘annWatDef’ (opposite of recommendation by function, but annWatDef is easier to calculate )
  • ‘prcp’ and ‘prcp_dry’: cor = .78; chose ‘prcp’ (opposite of recommendation by function, but prcp is easier to calculate)
  • ‘tmean’ and ‘VPD_max’: cor = .94; chose ‘tmean’ (opposite of recommendation by function, but tmean is easier to calculate)
  • ‘clay’ and ‘sand’: cor = .79; chose ‘sand’
  • ‘annWatDef’ and ‘prcp_seasonality’: cor = .83; chose ‘prcp_seasonality’

The final variables selected are shown below

Forests

First, get data just for forest sites (and drop NAs )

#Here is the correlation structure of all of the potential input climate and soils variables
(corrPlot <- 
   datForest %>% 
      dplyr::select(
       "tmin", 
      "tmax", 
      "tmean",
      "prcp",
      "t_warm",
      "t_cold",
      "prcp_wet",
      "prcp_dry",
      "prcp_seasonality",
      "prcpTempCorr",
      "abvFreezingMonth",
      "isothermality",
      "annWatDef",
      "annWetDegDays",
      "VPD_mean",
      "VPD_max",
      "VPD_min",
      "VPD_max_95",
      "annWatDef_95",
      "annWetDegDays_5",
      "frostFreeDays_5",
      "frostFreeDays",
      "soilDepth",
      "clay",
      "sand",
      "coarse",
      "carbon",
      "AWHC"
   ) %>% 
    slice_sample(n = 5e3) %>% 
   #cor()  %>% 
   #caret::findCorrelation(cutoff = .7, verbose = TRUE, names = TRUE, exact = TRUE)) 
   ggpairs( upper = list(continuous = my_fn, size = .1), lower = list(continuous = GGally::wrap("points", alpha = 0.1, size=.1
                                                                                                )), progress = FALSE) #+ ggtitle("Correlation of Forest Predictor Variables")
 )

bmp( here("Figures", "CoverDatFigures", "ForestClimVarCorrelations.bmp"), width = 2500, height = 2500)
corrPlot
dev.off()
## quartz_off_screen 
##                 2

The following variables were removed due to correlation with other variables – first pass (vars. in parenthesis are kept):

  • tmin, tmax, t_warm, t_cold (tmean)
  • precip_wet (prcp)
  • frostfreedays, frostfree days_5, abvfreezingmonth (tmin)
  • VPD mean, VPD min, VPD max 95 (VPD_max)
  • prcp_seasonality, annwatdef_95 (annwatdef)
  • annwetdegdays_5 (annwetdegdays)
  • soildepth (AWHC)

Then, we removed the following additional variables based on their correlation:

## All correlations <= 0.7
## character(0)
  • ‘annWetDegDays’ and ‘tmean’: cor = .88; chose ‘tmean’
  • ‘tmean’ and ‘VPD_max’: cor = .94; chose ‘VPD_max’ (opposite of what function recommends, but tmean is easier to calculate)

The final variables selected are shown below

Then, identify weather anomaly predictors that are uncorrelated with the climate and soils predictors identified above – CONUS-wide, then uniquely for each ecoregion

CONUS

The following variables were removed due to correlation with other variables:

  • ‘tmean_anom’ and ‘VPD_mean_anom’: cor = .85; chose ‘VPD_mean_anom’
  • ‘VPD_mean_anom’ and ‘tmin_anom’: cor = .71; chose ‘tmin_anom’
  • ‘VPD_max_anom’ and ‘t_warm_anom’: cor = .82; chose ‘t_warm_anom’
  • ‘t_cold_anom’ and ‘VPD_min_anom’: cor = .79; chose ‘VPD_min_anom’
  • ‘frostFreeDays_anom’ and ‘aboveFreezingMonth_anom’: cor = .86; chose ‘aboveFreezingMonth_anom’
  • ‘prcp_anom’ and ‘prcp_wet_anom’: cor = .73; chose ‘prcp_wet_anom’
The remaining variables that are uncorrelated are shown here:
Climate_And_Soils_Variables Weather_Anomaly_Variables
tmean tmin_anom
prcp tmax_anom
prcp_seasonality t_warm_anom
prcpTempCorr prcp_wet_anom
isothermality precp_dry_anom
annWetDegDays prcp_seasonality_anom
sand prcpTempCorr_anom
coarse aboveFreezingMonth_anom
AWHC isothermality_anom
NA annWatDef_anom
NA annWetDegDays_anom
NA VPD_min_anom

Shrubland and Grassland

The following variables were removed due to correlation with other variables:

  • ‘VPD_mean_anom’ and ‘tmean_anom’: cor = .86; chose ‘tmean_anom’
  • ‘tmean_anom’ and ‘tmin_anom’: cor = .86; chose ‘tmin_anom’
  • ‘VPD_max_anom’ and ‘t_warm_anom’: cor = .83; chose ‘t_warm_anom’
  • ‘t_cold_anom’ and ‘VPD_min_anom’: cor = .78; chose ‘VPD_min_anom’
  • ‘prcp_anom’ and ‘prcp_wet_anom’: cor = .75; chose ‘prcp_wet_anom’
  • ‘frostFreeDays_anom’ and ‘aboveFreezingMonth_anom’: cor = .74; chose ‘aboveFreezingMonth_anom’
The remaining variables that are uncorrelated are shown here:
Climate_And_Soils_Variables Weather_Anomaly_Variables
tmean tmin_anom
prcp tmax_anom
prcp_seasonality t_warm_anom
prcpTempCorr prcp_wet_anom
isothermality precp_dry_anom
sand prcp_seasonality_anom
coarse prcpTempCorr_anom
carbon aboveFreezingMonth_anom
AWHC isothermality_anom
NA annWatDef_anom
NA annWetDegDays_anom
NA VPD_min_anom

Forest

The following variables were removed due to correlation with other variables:

  • ‘tmean_anom’ and ‘tmin_anom’: cor = .88; chose ‘tmin_anom’
  • ‘tmin_anom’ and ‘VPD_mean_anom’: cor = .74; chose ‘VPD_mean_anom’
  • ‘VPD_mean_anom’ and ‘VPD_max_anom’: cor = .76; chose ‘VPD_max_anom’
  • ‘VPD_max_anom’ and ‘t_warm_anom’: cor = .83; chose ‘t_warm_anom’
  • ‘frostFreeDays_anom’ and ‘aboveFreezingMonth_anom’: cor = .87; chose ‘aboveFreezingMonth_anom’
  • ‘t_cold_anom’ and ‘VPD_min_anom’: cor = .82; chose ‘VPD_min_anom’
The remaining variables that are uncorrelated are shown below
Climate_And_Soils_Variables Weather_Anomaly_Variables
tmean tmax_anom
prcp prcp_anom
prcp_dry t_warm_anom
prcpTempCorr t_cold_anom
isothermality prcp_wet_anom
annWatDef precp_dry_anom
clay prcp_seasonality_anom
sand prcpTempCorr_anom
coarse aboveFreezingMonth_anom
carbon isothermality_anom
AWHC annWatDef_anom
NA annWetDegDays_anom
NA VPD_min_anom